DevOps
CICD
K8s
Docker
The cluster is broken. Something is wrong with scaling Pods. We just tried scaling the deployment to 2 replicas
. But it's not happening. Troubleshoot and fix the issue.
這題題目直接指出這個集群故障的問題了,那就是Deployment
無法將Pod
Scale為2個,甚麼原因導致我們目前還無從得知?但是我們知道,在 K8s 中負責管理Scaling的組件是 kube-conteoller-manager
,我們先檢查一下這個元件是否在work:
$ kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
...
kube-apiserver-g8master 1/1 Running 1 6m7s
kube-controller-manager-g8master 0/1 CrashLoopBackOff 4 2m34s
kube-proxy-h2smv 1/1 Running 0 5m49s
...
果然發現 kube-controller-manager
crash了,我們先用 kubectl describe
看一下它發生甚麼事
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 7m12s (x5 over 8m41s) kubelet, master Container image "k8s.gcr.io/kube-controller-manager:v1.18.0" already present on machine
Normal Created 7m12s (x5 over 8m41s) kubelet, master Created container kube-controller-manager
Normal Started 7m12s (x5 over 8m41s) kubelet, master Started container kube-controller-manager
Warning BackOff 3m33s (x27 over 8m37s) kubelet, master Back-off restarting failed container
好像看不出甚麼問題XD,那查找一下log檔吧
kubectl logs -n kube-system kube-controller-manager-g8master
I0910 08:22:45.990175 1 serving.go:313] Generated self-signed cert in-memory
unable to load client CA file "/etc/kubernetes/pki/ca.crt": open /etc/kubernetes/pki/ca.crt: no such file or directory
找到問題囉,看起來是認證的 ca.crt 檔案lost,看一下該目錄下是否有這個檔案
$ ls /etc/kubernetes/pki/
apiserver-etcd-client.crt apiserver-kubelet-client.crt apiserver.crt ca.crt etcd front-proxy-ca.key front-proxy-client.key sa.pub
apiserver-etcd-client.key apiserver-kubelet-client.key apiserver.key ca.key front-proxy-ca.crt front-proxy-client.crt sa.key
有欸,找到了ca.crt,但是為甚麼kube-controller-manager
會顯示找不到呢?若檔案確實存在該目錄,但kube-controller-manager
查找該目錄卻沒有,那原因可能 是有鬼 是kube-controller-manager
掛載的Vloume mount路徑設定錯誤,所以它在錯誤的目錄下找不到這個ca.crt是正常的,有沒有可能是這樣呢?我們來驗證一下:
$ vim /etc/kubernetes/manifests/kube-controller-manager.yaml
...
volumeMounts:
...
## 找到volumeMount的名稱
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
...
...
## 往下找名為k8s-certs的volumes的hostPath
volumes:
...
- hostPath:
name: k8s-certs
path: /etc/kubernetes/aaaaa
type: kube-controller-manager.yaml
...
...
可以發現hostPath
的路徑,也就是path: /etc/kubernetes/aaaaa
設錯啦~ 難怪找不到檔案,因為根本沒有這個目錄阿!將它改成正確路徑就ok囉~
$ vim /etc/kubernetes/manifests/kube-controller-manager.yaml
...
volumes:
...
- hostPath:
name: k8s-certs
path: /etc/kubernetes/pki
type: kube-controller-manager.yaml
...
...
接著等待一段時間,等 kubelet
重新創建Static Pod
,就發現 kube-controller-manager
正常work起來囉 (不須將Pod刪除重建)~
因為
kube-controller-manager
是Static Pod
,它的YAML設定檔default位於/etc/kubernetes/manifests/
路徑下,所以可以直接修改kube-controller-manager.yaml中的設定。其他像etcd、kube-apiserver和kube-scheduler這類Static Pod也可以透過這種方式修改
若
Static Pod
路徑遭到修改,要如何查找該路徑呢?我們在【Day6】奇怪的Pod - Static Pod有做過介紹喔,可以回去複習一下~
今天介紹Control Plane Failure的問題,這類問題包含調度異常、Pod
無法新增刪除、無法scaling等。像這種情況通常都是kube-system namespace底下的組件故障,這些組件都是Static Pod
,所以要以Static Pod
的視角來debug,還要注意Static Pod
的路徑 (像這題就還好路徑沒變,有些題目會先修改Static Pod
路徑,讓你還要花時間查config) 還有建立的問題。整體來說不會太難,只是要細心就是了。好啦,今天就到這囉~ 謝謝大家~
Volumes
Create static Pods
kubectl Cheat Sheet
You can find me on